Uncertain Data Integration with Probabilities
نویسندگان
چکیده
Real world applications that deal with information extraction, such as business intelligence software or sensor data management, must often process data provided with varying degrees of uncertainty. Uncertainty can result from multiple or inconsistent sources, as well as approximate schema mappings. Modeling, managing and integrating uncertain data from multiple sources has been an active area of research in recent years [6][7][1][2]. In particular, data integration systems free the user from the tedious tasks of finding relevant data sources, interacting with each source in isolation using its corresponding interface and combining data from multiple sources by providing a uniform query interface to gain access to the integrated information [5]. Previous work has integrated uncertain data using representation models such as the possible worlds and probabilistic relations [12][1][2]. We extend this work by determining the probabilities of possible worlds of an extended probabilistic relation. We also present an algorithm to determine when a given extended probabilistic relation can be obtained by the integration of two probabilistic relations and give the decomposed pairs of probabilistic relations. iii ACKNOWLEDGEMENTS I sincerely thank my advisor, Dr. Fereidoon Sadri, for his abundant guidance and support throughout the course of this research without which this thesis would not have been possible. I am very thankful to him for giving me the opportunity to work with him and believing in me. I thoroughly enjoyed working under him. I would also like to thank Dr. Jing Deng and Dr. Nancy Green for their valuable guidance and feedback. I am indebted to my husband for his constant encouragement, support and love. I am very grateful to my mother and my family members for their unconditional support and care. I want to thank Nina Revankar and her family for their ample love and concern during my study. Nina's spontaneous gestures of help on those busy days really made a big difference, and I am deeply indebted to her for it. I would like to express my gratitude to my friends at School for cheering me up and encouraging me all through. I am highly grateful to all my dear friends in Greensboro for keeping my life outside School fun at all times, and for their endless concern and support throughout.
منابع مشابه
Probabilistic Data Integration Systems
Current data integration techniques are successful at managing well-defined and wellunderstood data integration tasks, but do not cope well with uncertainty. However, the amount of uncertain data is growing with the number and variety of data sources being integrated, both in traditional data integration tasks s.a. enterprise data integration, and in next generation integration problems, s.a. c...
متن کاملIdentifying Interesting Instances for Probabilistic Skylines
Uncertain data arises from various applications such as sensor networks, scientific data management, data integration, and location based applications. While significant research efforts have been dedicated to modeling, managing and querying uncertain data, advanced analysis of uncertain data is still in its early stages. In this paper, we focus on skyline analysis of uncertain data, modeled as...
متن کاملIndexing Probabilistic Nearest-Neighbor Threshold Queries
Data uncertainty is inherent in many applications, including sensor networks, scientific data management, data integration, locationbased applications, etc. One of common queries for uncertain data is the probabilistic nearest neighbor (PNN) query that returns all uncertain objects with non-zero probabilities to be NN. In this paper we study the PNN query with a probability threshold (PNNT), wh...
متن کاملProbabilistic Local Features in Uncertain Vector Fields with Spatial Correlation
In this paper methods for extraction of local features in crisp vector fields are extended to uncertain fields. While in a crisp field local features are either present or absent at some location, in an uncertain field they are present with some probability. We model sampled uncertain vector fields by discrete Gaussian random fields with empirically estimated spatial correlations. The variabili...
متن کاملOptimizing Probabilistic Query Processing on Continuous Uncertain Data
Uncertain data management is becoming increasingly important in many applications, in particular, in scientific databases and data stream systems. Uncertain data in these new environments is naturally modeled by continuous random variables. An important class of queries uses complex selection and join predicates and requires query answers to be returned if their existence probabilities pass a t...
متن کامل